1 Examples in \(d=2\)

We optimize iteratively the objective function \[ \lambda\underbrace{\sum_{i=1}^n{\sum_{k=1}^{d-1}}\left(\frac{1}{\sqrt{d}}\|{\mathbf{P}_k}\mathbf{X}_i\|-1\right)^2}_{\substack{\text{Proj's distance to}\\ \sqrt{d}\times\mathbb{S}^{2 d-1}\supset(\mathbb{S}^1)^d}}+(1-\lambda)\underbrace{\sum_{i=1}^n\|(\mathbf{I}-\mathbf{P})\mathbf{X}_i\|^2}_{\substack{\text{Complement to}\\\text{proj's variation}}} \] We set \(\lambda=0.5\) in the next plots. We use the raw scores (no rescaling, see Issue 1). The legend for the next plots:

1.1 Steve’s dataset

## Reduction to dimension d = 2. Time: 0.1 seconds.
## Reduction to dimension d = 1. Time: 0.309 seconds.

1.2 Uniform-like

Uniform in \([-\pi,\pi)^2\)

## Reduction to dimension d = 2. Time: 0.142 seconds.
## Reduction to dimension d = 1. Time: 0.347 seconds.

Uniform vertical band

## Reduction to dimension d = 2. Time: 0.262 seconds.
## Reduction to dimension d = 1. Time: 0.461 seconds.

1.3 Diagonal-like

Diagonal $y=x

## Reduction to dimension d = 2. Time: 0.085 seconds.
## Reduction to dimension d = 1. Time: 0.349 seconds.

Diagonal \(y=x-2\)

## Reduction to dimension d = 2. Time: 0.054 seconds.
## Reduction to dimension d = 1. Time: 0.235 seconds.

1.4 Trigonometric-like

\(\sin(x)\)

## Reduction to dimension d = 2. Time: 0.056 seconds.
## Reduction to dimension d = 1. Time: 0.237 seconds.

Shorter-support \(\sin(x)\)

## Reduction to dimension d = 2. Time: 0.058 seconds.
## Reduction to dimension d = 1. Time: 0.232 seconds.

1.5 Gaussian-like

Non-isotropic

## Reduction to dimension d = 2. Time: 0.064 seconds.
## Reduction to dimension d = 1. Time: 0.342 seconds.

Rotated

## Reduction to dimension d = 2. Time: 0.148 seconds.
## Reduction to dimension d = 1. Time: 0.337 seconds.

1.6 Clusters

Two separated clusters

## Reduction to dimension d = 2. Time: 0.164 seconds.
## Reduction to dimension d = 1. Time: 0.431 seconds.

Three clusters

## Reduction to dimension d = 2. Time: 0.192 seconds.
## Reduction to dimension d = 1. Time: 0.495 seconds.

2 \(\lambda\) effect

Legend for the next plots:

Three sparse separated clusters

## Reduction to dimension d = 2. Time: 0.259 seconds.
## Reduction to dimension d = 1. Time: 0.533 seconds.
## Reduction to dimension d = 2. Time: 0.252 seconds.
## Reduction to dimension d = 1. Time: 0.62 seconds.

Three concentrated separated clusters

## Reduction to dimension d = 2. Time: 1.029 seconds.
## Reduction to dimension d = 1. Time: 1.893 seconds.
## Reduction to dimension d = 2. Time: 0.925 seconds.
## Reduction to dimension d = 1. Time: 1.741 seconds.

Shorter-support \(\sin(x)\)

## Reduction to dimension d = 2. Time: 0.3 seconds.
## Reduction to dimension d = 1. Time: 0.701 seconds.
## Reduction to dimension d = 2. Time: 0.291 seconds.
## Reduction to dimension d = 1. Time: 0.727 seconds.

3 Issue 1: re-scaling vs. not re-scaling the raw scores

In the way we construct the (raw) scores, we have that, in \(d=2\): \[\begin{align*} \text{Score}_1=&\,\mathrm{Distance\_along\_curve\_from\_projection\_to\_mean}\in\left[-l/2,l/2\right),l\leq4\pi,\\ \text{Score}_2=&\,\mathrm{Distance\_from\_point\_to\_curve\_projection}\in\left[-\frac{\sqrt{2}\pi}{2},\frac{\sqrt{2}\pi}{2}\right). \end{align*}\]

Therefore, \((\text{Score}_1,\text{Score}_2)\not\in[-\pi,\pi)^2\). Using distanceScaled = TRUE, these are scores are multiplied by \(\frac{\pi}{l/2}\) and \(\frac{\pi}{\sqrt{2} \pi / 2}\) in order to force them to be in \([-\pi,\pi)\).

Recall that for a random variable \(X\) with \(\mathrm{supp}(X)=(a,b)\), \(\mathbb{V}\mathrm{ar}[X]\leq\frac{(b-a)^2}{4}\). This means that:

For general \(d\), \[ \text{Score}_d=\mathrm{Distance\_from\_point\_to\_surface\_projection}\in[-\frac{\sqrt{d}\pi}{2},\frac{\sqrt{d}\pi}{2}) \] and therefore \(\mathbb{V}\mathrm{ar}[\text{Score}_d]\leq\frac{1}{4}d\pi^2=\frac{d}{4}\pi^2=:C_d\). This means that \[ C_2<C_3<\ldots<C_{d-1}<C_{d} \] and that \[ C_{1}=C_{16}>C_{15}>\ldots>C_2,\quad C_1<C_{17}<C_{18}<\ldots<C_d. \] Setting distanceScaled = FALSE (raw scores) or distanceScaled = TRUE (applies rescaling) has consequences, as illustrated below.

Shorter-support \(\sin(x)\)

## Reduction to dimension d = 2. Time: 0.119 seconds.
## Reduction to dimension d = 1. Time: 0.353 seconds.
## Reduction to dimension d = 2. Time: 0.152 seconds.
## Reduction to dimension d = 1. Time: 0.331 seconds.

Non-isotropic Gaussian \(\sin(x)\)

## Reduction to dimension d = 2. Time: 0.061 seconds.
## Reduction to dimension d = 1. Time: 0.229 seconds.
## Reduction to dimension d = 2. Time: 0.056 seconds.
## Reduction to dimension d = 1. Time: 0.281 seconds.

Rotated Gaussian

## Reduction to dimension d = 2. Time: 0.061 seconds.
## Reduction to dimension d = 1. Time: 0.321 seconds.
## Reduction to dimension d = 2. Time: 0.056 seconds.
## Reduction to dimension d = 1. Time: 0.236 seconds.

4 Issue 2: excentricity of the curves

Some choices of the vectors \(\mathbf{u}\) and \(\mathbf{v}\) provide more squarish curves than others. Squarish fits tend to yield degenerate projections to the corners. This is likely more problematic in higher dimensions, since the moment the projections collapse, they will remain degenerate for lower-dimensional fits, hence producing a sequence of degenerate scores.

I do not know what is the parametrization of \(\mathbf{u}\) and \(\mathbf{v}\) that provides squarish curves (squarish can be characterized by the length of the curves, which is close to \(4\pi\) for the squarish ones). It is somehow related with the vectors \(\mathbf{u}\) and \(\mathbf{v}\) having entries close to \(0\) (respectively, close to \(1\)), as the following empirical evidence suggests (regressions of lengths on entries of the vectors):

# Sample random curves and evaluate their lengths
M <- 1e4
l <- numeric(M)
x <- matrix(nrow = M, ncol = 4)
y <- matrix(nrow = M, ncol = 3)
u <- v <- matrix(nrow = M, ncol = 4)
for (i in 1:M) {
    
  x[i, ] <- rnorm(n = 4)
    #c(runif(2, min = 0, max = pi), runif(1, min = -pi, max = pi))
  x[i, ] <- x[i, ] / sqrt(sum(x[i, ]^2))
  y[i, ] <- rnorm(n = 3)
    #c(runif(1, min = 0, max = pi), runif(1, min = -pi, max = pi))
  y[i, ] <- y[i, ] / sqrt(sum(y[i, ]^2))
  u[i, ] <- x[i, ]#anglesToSphere(theta = x[i, ])
  v[i, ] <- y[i, ] %*% Hu(u = u[i, ])[-1, ] #c(anglesToSphere(theta = y[i, ]) %*% Hu(u = u[i, ])[-1, ])
  l[i] <- distPC1Curve(alpha = c(0, 2 * pi - 1e-4), u = u[i, ], v = v[i, ], 
                       N = 1e3, shortest = FALSE, der = FALSE)

}
df <- data.frame("l" = l, "u" = u, "v" = v)
df <- reshape(df, direction = "long", 
              varying = c("u.1", "u.2", "u.3", "u.4", "v.1", "v.2", "v.3", "v.4"),
              times = c("u", "v"), timevar = "k")
df <- reshape2::melt(df, measure.vars = c("u", "v"))
df$id <- NULL
df$k <- as.factor(df$k)
names(df) <- c("len", "k", "uv", "vec")
grid.arrange(
  ggplot(data = df[df$uv == "u", ], mapping = aes(x = vec, y = len, colour = k)) + 
    geom_smooth(se = FALSE) + ggtitle("Regression of the length curve on the entries of u"),
  ggplot(data = df[df$uv == "v", ], mapping = aes(x = vec, y = len, colour = k)) + 
    geom_smooth(se = FALSE) + ggtitle("Regression of the length curve on the entries of v")
)

If we could characterize when the curves are squarish and also the surface, we could add a small penalty in the obejective function to avoid degenerate solutions.